GeneZip: A software package for storage-efficient processing of genotype data
نویسندگان
چکیده
Genome-wide association studies directly assay 10 single nucleotide polymorphisms (SNPs) across a study cohort. Probabilistic estimation of additional sites by genotype imputation can increase this set of variants by 10to 40-fold. Even with modest sample sizes (10−10), these resulting “imputed” datasets, containing 10 − 10 double-precision values, are incompatible with simultaneous lossless storage in RAM using standard methods. Existing solutions for this problem require compromises in either genotype accuracy or complexity of permissible statistical methods. Here, we present a C/C++ library that dynamically compresses probabilistic genotype data as they are loaded into memory. This method uses a customization of the DEFLATE (gzip) algorithm, and maintains constant-time access to any SNP. Average compression ratios of > 9−fold are observed in test data.
منابع مشابه
Selection of the Best Efficient Method for Natural Gas Storage at High Capacities Using TOPSIS Method
Nowadays one of the most important energy sources is natural gas. By depletion of oil reservoirs in the world, natural gas will emerge as the future energy source for human life. One of the major concerns of gas suppliers is being able to supply this source of energy the entire year. This concern intensifies during more consuming seasons of the year when the demand for natural gas increases, r...
متن کاملNew algorithm for tensor contractions on multi-core CPUs, GPUs, and accelerators enables CCSD and EOM-CCSD calculations with over 1000 basis functions on a single compute node
A new hardware-agnostic contraction algorithm for tensors of arbitrary symmetry and sparsity is presented. The algorithm is implemented as a stand-alone open-source code libxm. This code is also integrated with general tensor library libtensor and with the Q-Chem quantum-chemistry package. An overview of the algorithm, its implementation, and benchmarks are presented. Similarly to other tensor ...
متن کاملR/Bioconductor software for Illumina's Infinium whole-genome genotyping BeadChips
UNLABELLED Illumina produces a number of microarray-based technologies for human genotyping. An Infinium BeadChip is a two-color platform that types between 10(5) and 10(6) single nucleotide polymorphisms (SNPs) per sample. Despite being widely used, there is a shortage of open source software to process the raw intensities from this platform into genotype calls. To this end, we have developed ...
متن کاملThe Supertree Toolkit 2: a new and improved software package with a Graphical User Interface for supertree construction
Building large supertrees involves the collection, storage, and processing of thousands of individual phylogenies to create large phylogenies with thousands to tens of thousands of taxa. Such large phylogenies are useful for macroevolutionary studies, comparative biology and in conservation and biodiversity. No easy to use and fully integrated software package currently exists to carry out this...
متن کاملEffect of Processing Temperature on Storage Quality of In-Shell Hazelnut
Background: Drying is the one of the oldest methods for increasing the shelf life of food products. The objective of the present study was evaluation of effect of different drying temperatures on drying time and storage quality parameters of in-shell hazelnut. Methods: Hazelnuts were dried as a thin layer at three temperatures (40, 50, and 60 °C). The time required for drying and quality param...
متن کامل